We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before. We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance. Our experiments show that larger vocabularies lead to fewer splits, and that reducing splits is better for model performance, across different tasks. All in all this new model achieves new SOTA on all available Hebrew benchmarks, including Morphological Segmentation, POS Tagging, Full Morphological Analysis, NER, and Sentiment Analysis. Subsequently we advocate for PLMs that are larger not only in terms of number of layers or training data, but also in terms of their vocabulary. We release the new model publicly for unrestricted use.
translated by 谷歌翻译
我们为拉比希伯来语提出了一种新的预训练的语言模型(PLM),称为Berel(Bert bert嵌入了拉比编码的语言)。尽管存在其他PLM用于处理希伯来文本(例如Hebert,Alephbert),但它们都接受了现代希伯来语文本的培训,该文本在其词典,形态学,义学和正学规范方面与犹太人希伯来语有很大的不同。我们通过一组希伯来语同源物来证明贝雷尔在拉比文本上的优越性。我们发布了无限制使用的新模型和同型挑战。
translated by 谷歌翻译
Graph Neural Networks (GNNs) are prominent in handling sparse and unstructured data efficiently and effectively. Specifically, GNNs were shown to be highly effective for node classification tasks, where labelled information is available for only a fraction of the nodes. Typically, the optimization process, through the objective function, considers only labelled nodes while ignoring the rest. In this paper, we propose novel objective terms for the training of GNNs for node classification, aiming to exploit all the available data and improve accuracy. Our first term seeks to maximize the mutual information between node and label features, considering both labelled and unlabelled nodes in the optimization process. Our second term promotes anisotropic smoothness in the prediction maps. Lastly, we propose a cross-validating gradients approach to enhance the learning from labelled data. Our proposed objectives are general and can be applied to various GNNs and require no architectural modifications. Extensive experiments demonstrate our approach using popular GNNs like GCN, GAT and GCNII, reading a consistent and significant accuracy improvement on 10 real-world node classification datasets.
translated by 谷歌翻译
Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone -- sans a reasoning of how said answer was derived -- is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image -- hardly influencing the network's output -- we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method's performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye, such that the XAI method outputs a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory.
translated by 谷歌翻译
Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. A knowledge distillation approach addresses the computational efficiency by self-distilling BERT into a smaller transformer representation having fewer layers and smaller internal embedding. However, the performance of these models drops as we reduce the number of layers, notably in advanced NLP tasks such as span question answering. In addition, a separate model must be trained for each inference scenario with its distinct computational budget. Dynamic-TinyBERT tackles both limitations by partially implementing the Length Adaptive Transformer (LAT) technique onto TinyBERT, achieving x3 speedup over BERT-base with minimal accuracy loss. In this work, we expand the Dynamic-TinyBERT approach to generate a much more highly efficient model. We use MiniLM distillation jointly with the LAT method, and we further enhance the efficiency by applying low-bit quantization. Our quantized length-adaptive MiniLM model (QuaLA-MiniLM) is trained only once, dynamically fits any inference scenario, and achieves an accuracy-efficiency trade-off superior to any other efficient approaches per any computational budget on the SQuAD1.1 dataset (up to x8.8 speedup with <1% accuracy loss). The code to reproduce this work is publicly available on Github.
translated by 谷歌翻译
Transformer-based language models have become the standard approach to solving natural language processing tasks. However, industry adoption usually requires the maximum throughput to comply with certain latency constraints that prevents Transformer models from being used in production. To address this gap, model compression techniques such as quantization and pruning may be used to improve inference efficiency. However, these compression techniques require specialized software to apply and deploy at scale. In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators. We demonstrate the efficiency of our pipeline by creating a Fast DistilBERT model showing minimal accuracy loss on the question-answering SQuADv1.1 benchmark, and throughput results under typical production constraints and environments. Our results outperform existing state-of-the-art Neural Magic's DeepSparse runtime performance by up to 50% and up to 4.1x performance speedup over ONNX Runtime. Source code is publicly available at https://github.com/intel/intel-extension-for-transformers.
translated by 谷歌翻译
结肠镜检查是一种常规门诊手术,用于检查结肠和直肠的任何异常,包括息肉,憩室和结肠结构的狭窄。临床医生的大量时间用于在结肠镜检查过程中拍摄的快照,以维持医疗记录或进一步研究。自动化此步骤可以节省时间并提高流程的效率。在我们的工作中,我们收集了一个由专家注释的过程中的120个结肠镜检查视频和2416张快照的数据集。此外,我们开发了一种基于新颖的,视觉转化器的地标检测算法,该算法可以从结肠镜检查过程中鉴定出关键的解剖标志(阑尾孔,回肠瓣膜/盲肠地标和直肠翻新)。我们的算法在预处理过程中使用自适应伽马校正,以保持所有图像的一致亮度。然后,我们将视觉变压器用作特征提取主链和完全连接的基于网络的分类器头,将给定的框架分为四个类:三个地标或非地标框架。我们将视觉变压器(VIT-B/16)主链与RESNET-101和Convnext-B骨干进行了比较,这些骨干和Convnext-B骨干也接受了类似训练。我们报告了快照的测试数据集上的视觉变压器主链的精度为82%。
translated by 谷歌翻译
最近的几种方法,例如参数有效的微调(PEFT)和模式开发训练(PET),在标签筛选设置中取得了令人印象深刻的结果。但是,它们很难使用,因为它们会受到手动制作的提示的高度可变性,并且通常需要十亿参数语言模型才能达到高精度。为了解决这些缺点,我们提出了SETFIT(句子变压器微调),这是一个有效且迅速的框架,用于对句子变形金刚(ST)进行几次微调。 SetFit首先以对比的暹罗方式对少数文本对进行微调验证的st。然后将所得模型用于生成丰富的文本嵌入,这些嵌入方式用于训练分类头。这个简单的框架不需要任何提示或口头化,并且比现有技术少的参数较少,因此可以实现高精度。我们的实验表明,SetFit通过PEFT和PET技术获得了可比的结果,同时训练的速度更快。我们还表明,SETFIT可以通过简单地切换ST主体来应用于多语言设置。我们的代码可从https://github.com/huggingface/setFit以及我们的数据集获得,网址为https://huggingface.co/setfit。
translated by 谷歌翻译
进化计算(EC)已被证明能够快速训练深人造神经网络(DNNS)来解决增强学习(RL)问题。虽然遗传算法(GA)非常适合利用既不具有欺骗性也不稀疏的奖励功能,但当奖励函数是这些功能时,它会挣扎。为此,在某些情况下,新颖的搜索(NS)已被证明能够超越梯度跟随优化器,而在其他情况下则表现不佳。我们提出了一种新算法:探索 - 探索$ \ gamma $ - 适应学习者($ e^2 \ gamma al $或eyal)。通过保留动态大小的寻求新颖的代理商的利基市场,该算法可以维持人口多样性,并在可能的情况下利用奖励信号并探索其他奖励信号。该算法将GA的剥削能力和NS的勘探能力结合在一起,同时保持其简单性和优雅性。我们的实验表明,在大多数情况下,Eyal在与GA相当的情况下都胜过NS - 在某些情况下,它可以均优于两者。 Eyal还允许用其他算法(例如演化策略和惊喜搜索)代替利用组件(GA)和探索组件(NS)(NS),从而为未来的研究打开了大门。
translated by 谷歌翻译
金融完整性系统中的绩效指标对于维持高效且具有成本效益的操作至关重要。重要的性能指标是假阳性率。无法直接监视此指标,因为我们不确定一旦阻止用户是否不好。我们提出了一种基于调查理论和因果推理方法的统计方法,以估计系统的假阳性率或单个阻止策略。我们还建议一种新的结果匹配方法,在某些情况下,包括经验数据的表现优于其他常用方法。本文中描述的方法可以应用于其他完整性域,例如网络安全。
translated by 谷歌翻译